STAT_PARETO

Overview

The STAT_PARETO function fits a collection of statistical models inspired by the Pareto distribution and related logarithmic transformations to observed data. These models capture power-law and logarithmic decay behaviors commonly found in wealth distributions, natural phenomena, and systems exhibiting heavy-tailed characteristics. The function uses non-linear least squares optimization via SciPy’s curve_fit to find optimal parameter values.

The Pareto distribution, named after Italian economist Vilfredo Pareto, is characterized by its power-law probability tail. For the classic Pareto Type I distribution, the cumulative distribution function is:

F(x) = 1 - \left(\frac{x_m}{x}\right)^\alpha \quad \text{for } x \geq x_m

where x_m is the minimum value (scale parameter) and \alpha is the shape parameter (tail index). This distribution underlies the famous “80-20 rule” (Pareto principle), where approximately 20% of causes produce 80% of effects.

The function supports eight model variants:

  • Reciprocal Log Power Law: y = \frac{1}{a + b \ln(x)} — models inverse logarithmic relationships
  • Log Offset Linear and Log One Plus X Linear: y = a + b \ln(x) — classic logarithmic growth with offset
  • Shifted Log Linear Decay: y = a - b \ln(x + c) — logarithmic decay with a horizontal shift
  • Simple Natural Log: y = A \ln(x) — single-parameter natural logarithm scaling
  • Pareto Power Law (Single Param): y = 1 - x^{-A} — simplified Pareto CDF form
  • Pareto Power Law With Scale: y = 1 - \left(\frac{b}{x}\right)^a — full Pareto CDF with scale parameter b
  • Pearson Type IV Asymmetric Peak: A six-parameter model for asymmetric peaked distributions, combining power-law decay with exponential arctangent modulation

The fitting algorithm employs the Levenberg-Marquardt method (for unbounded problems) or trust-region reflective methods when bounds are specified. Each model includes intelligent initial parameter guesses derived from data statistics, and the function returns both fitted parameter values and their standard errors estimated from the covariance matrix.

These models are applicable to economics (income/wealth distributions), reliability engineering, network traffic analysis, hydrology (extreme rainfall events), and any domain where power-law or heavy-tailed distributions arise. For implementation details, see the SciPy curve_fit documentation and the SciPy GitHub repository.

This example function is provided as-is without any representation of accuracy.

Excel Usage

=STAT_PARETO(xdata, ydata, stat_pareto_model)
  • xdata (list[list], required): The xdata value
  • ydata (list[list], required): The ydata value
  • stat_pareto_model (str, required): The stat_pareto_model value

Returns (list[list]): 2D list [param_names, fitted_values, std_errors], or error string.

Examples

Example 1: Demo case 1

Inputs:

stat_pareto_model xdata ydata
reciprocal_log_power_law 0.1 3.0367302369217555
1.3250000000000002 0.32076392890124844
2.5500000000000003 0.3034487893300985
3.7750000000000004 0.32488474825975155
5 0.21237394505129054

Excel formula:

=STAT_PARETO("reciprocal_log_power_law", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {3.0367302369217555;0.32076392890124844;0.3034487893300985;0.32488474825975155;0.21237394505129054})

Expected output:

"non-error"

Example 2: Demo case 2

Inputs:

stat_pareto_model xdata ydata
log_offset_linear 0.1 0.36902764003516775
1.3250000000000002 3.0352556604224823
2.5500000000000003 3.780807602771912
3.7750000000000004 4.257479048422858
5 4.4225894632319145

Excel formula:

=STAT_PARETO("log_offset_linear", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {0.36902764003516775;3.0352556604224823;3.780807602771912;4.257479048422858;4.4225894632319145})

Expected output:

"non-error"

Example 3: Demo case 3

Inputs:

stat_pareto_model xdata ydata
log_one_plus_x_linear 0.1 2.86570248495542
1.3250000000000002 3.6315561990598044
2.5500000000000003 4.100671485316171
3.7750000000000004 4.439478709816687
5 4.623980897897044

Excel formula:

=STAT_PARETO("log_one_plus_x_linear", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {2.86570248495542;3.6315561990598044;4.100671485316171;4.439478709816687;4.623980897897044})

Expected output:

"non-error"

Example 4: Demo case 4

Inputs:

stat_pareto_model xdata ydata
shifted_log_linear_decay 0.1 2.118089042277242
1.3250000000000002 1.6000721166439364
2.5500000000000003 1.2381755368660448
3.7750000000000004 0.9360752867947714
5 0.7548674938171431

Excel formula:

=STAT_PARETO("shifted_log_linear_decay", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {2.118089042277242;1.6000721166439364;1.2381755368660448;0.9360752867947714;0.7548674938171431})

Expected output:

"non-error"

Example 5: Demo case 5

Inputs:

stat_pareto_model xdata ydata
simple_natural_log 0.1 -6.235879990384085
1.3250000000000002 0.7470981582493581
2.5500000000000003 2.6997341977359604
3.7750000000000004 3.948159412536055
5 4.380591451321681

Excel formula:

=STAT_PARETO("simple_natural_log", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {-6.235879990384085;0.7470981582493581;2.6997341977359604;3.948159412536055;4.380591451321681})

Expected output:

"non-error"

Example 6: Demo case 6

Inputs:

stat_pareto_model xdata ydata
pareto_power_law_single_param 0.1 -555.7562945839984
1.3250000000000002 -1.0158554001575881
2.5500000000000003 8.206369055095575
3.7750000000000004 18.09896551667567
5 -1.6447723377170855

Excel formula:

=STAT_PARETO("pareto_power_law_single_param", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {-555.7562945839984;-1.0158554001575881;8.206369055095575;18.09896551667567;-1.6447723377170855})

Expected output:

"non-error"

Example 7: Demo case 7

Inputs:

stat_pareto_model xdata ydata
pareto_power_law_with_scale 0.1 0.009535913828598551
1.3250000000000002 0.4699009341746975
2.5500000000000003 0.925280769824784
3.7750000000000004 0.9996079336594544
5 0.9818241912860092

Excel formula:

=STAT_PARETO("pareto_power_law_with_scale", {0.1;1.3250000000000002;2.5500000000000003;3.7750000000000004;5}, {0.009535913828598551;0.4699009341746975;0.925280769824784;0.9996079336594544;0.9818241912860092})

Expected output:

"non-error"

Example 8: Demo case 8

Inputs:

stat_pareto_model xdata ydata
pearson_type_iv_asymmetric_peak 0.01 3.92428409687526
2.0075 7.178926599327028
4.005 0.2107768608191403
6.0024999999999995 0.22034461095419103
8 0.01

Excel formula:

=STAT_PARETO("pearson_type_iv_asymmetric_peak", {0.01;2.0075;4.005;6.0024999999999995;8}, {3.92428409687526;7.178926599327028;0.2107768608191403;0.22034461095419103;0.01})

Expected output:

"non-error"

Python Code

import numpy as np
from scipy.optimize import curve_fit as scipy_curve_fit
import math

def stat_pareto(xdata, ydata, stat_pareto_model):
    """
    Fits stat_pareto models to data using scipy.optimize.curve_fit. See https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html for details.

    See: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html

    This example function is provided as-is without any representation of accuracy.

    Args:
        xdata (list[list]): The xdata value
        ydata (list[list]): The ydata value
        stat_pareto_model (str): The stat_pareto_model value Valid options: Reciprocal Log Power Law, Log Offset Linear, Log One Plus X Linear, Shifted Log Linear Decay, Simple Natural Log, Pareto Power Law Single Param, Pareto Power Law With Scale, Pearson Type Iv Asymmetric Peak.

    Returns:
        list[list]: 2D list [param_names, fitted_values, std_errors], or error string.
    """
    def _validate_data(xdata, ydata):
        """Validate and convert both xdata and ydata to numpy arrays."""
        for name, arg in [("xdata", xdata), ("ydata", ydata)]:
            if not isinstance(arg, list) or len(arg) < 2:
                raise ValueError(f"{name}: must be a 2D list with at least two rows")
            vals = []
            for i, row in enumerate(arg):
                if not isinstance(row, list) or len(row) == 0:
                    raise ValueError(f"{name} row {i}: must be a non-empty list")
                try:
                    vals.append(float(row[0]))
                except Exception:
                    raise ValueError(f"{name} row {i}: non-numeric value")
            if name == "xdata":
                x_arr = np.asarray(vals, dtype=np.float64)
            else:
                y_arr = np.asarray(vals, dtype=np.float64)

        if x_arr.shape[0] != y_arr.shape[0]:
            raise ValueError("xdata and ydata must have the same number of rows")
        return x_arr, y_arr

    # Model definitions dictionary
    models = {
        'reciprocal_log_power_law': {
            'params': ['a', 'b'],
            'model': lambda x, a, b: 1.0 / (a + b * np.log(x)),
            'guess': lambda xa, ya: (1.0, 1.0),
        },
        'log_offset_linear': {
            'params': ['a', 'b'],
            'model': lambda x, a, b: a + b * np.log(np.clip(x, 1e-9, None)),
            'guess': lambda xa, ya: (float(np.mean(ya)), 1.0),
        },
        'log_one_plus_x_linear': {
            'params': ['a', 'b'],
            'model': lambda x, a, b: a + b * np.log1p(np.clip(x, 0.0, None)),
            'guess': lambda xa, ya: (float(np.mean(ya)), 1.0),
        },
        'shifted_log_linear_decay': {
            'params': ['a', 'b', 'c'],
            'model': lambda x, a, b, c: a - b * np.log(np.clip(x + c, 1e-9, None)),
            'guess': lambda xa, ya: (float(np.mean(ya)), 1.0, float(np.min(xa))),
        },
        'simple_natural_log': {
            'params': ['A'],
            'model': lambda x, A: A * np.log(np.clip(x, 1e-9, None)),
            'guess': lambda xa, ya: (1.0,),
        },
        'pareto_power_law_single_param': {
            'params': ['A'],
            'model': lambda x, A: 1.0 - np.power(np.clip(x, 1e-9, None), -A),
            'guess': lambda xa, ya: (1.5,),
        },
        'pareto_power_law_with_scale': {
            'params': ['a', 'b'],
            'model': lambda x, a, b: 1.0 - np.power(b / np.clip(x, b, None), a),
            'guess': lambda xa, ya: (1.5, float(np.min(xa))),
            'bounds': (0.0, np.inf),
        },
        'pearson_type_iv_asymmetric_peak': {
            'params': ['y0', 'A', 'm', 'v', 'alpha', 'lam'],
            'model': lambda x, y0, A, m, v, alpha, lam: y0 + A * np.power(1.0 + np.square((x - lam) / alpha), -m) * np.exp(-v * np.arctan((x - lam) / alpha)),
            'guess': lambda xa, ya: (float(np.min(ya)), float(np.ptp(ya) if np.ptp(ya) else 1.0), 1.0, 0.0, float(max((np.max(xa) - np.min(xa)) / 6.0, 1e-3)), float(np.median(xa))),
            'bounds': ([-np.inf, -np.inf, 0.5, -np.inf, 0.0, -np.inf], np.inf),
        }
    }

    # Validate model parameter
    if stat_pareto_model not in models:
        return f"Invalid model: {str(stat_pareto_model)}. Valid models are: {', '.join(models.keys())}"

    model_info = models[stat_pareto_model]

    # Validate and convert input data
    try:
        x_arr, y_arr = _validate_data(xdata, ydata)
    except ValueError as e:
        return f"Invalid input: {e}"

    # Perform curve fitting
    try:
        p0 = model_info['guess'](x_arr, y_arr)
        bounds = model_info.get('bounds', (-np.inf, np.inf))
        if bounds == (-np.inf, np.inf):
            popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, maxfev=10000)
        else:
            popt, pcov = scipy_curve_fit(model_info['model'], x_arr, y_arr, p0=p0, bounds=bounds, maxfev=10000)

        fitted_vals = [float(v) for v in popt]
        for v in fitted_vals:
            if math.isnan(v) or math.isinf(v):
                return "Fitting produced invalid numeric values (NaN or inf)."
    except ValueError as e:
        return f"Initial guess error: {e}"
    except Exception as e:
        return f"curve_fit error: {e}"

    # Calculate standard errors
    std_errors = None
    try:
        if pcov is not None and np.isfinite(pcov).all():
            std_errors = [float(v) for v in np.sqrt(np.diag(pcov))]
    except Exception:
        pass

    return [model_info['params'], fitted_vals, std_errors] if std_errors else [model_info['params'], fitted_vals]

Online Calculator